{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GitHub\n",
    "\n",
    "This notebooks shows how you can load issues and pull requests (PRs) for a given repository on [GitHub](https://github.com/). Also shows how you can load github files for a given repository on [GitHub](https://github.com/). We will use the LangChain Python repository as an example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup access token"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access the GitHub API, you need a personal access token - you can set up yours here: https://github.com/settings/tokens?type=beta. You can either set this token as the environment variable ``GITHUB_PERSONAL_ACCESS_TOKEN`` and it will be automatically pulled in, or you can pass it in directly at initialization as the ``access_token`` named parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# If you haven't set your access token as an environment variable, pass it in here.\n",
    "from getpass import getpass\n",
    "\n",
    "ACCESS_TOKEN = getpass()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Issues and PRs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import GitHubIssuesLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = GitHubIssuesLoader(\n",
    "    repo=\"langchain-ai/langchain\",\n",
    "    access_token=ACCESS_TOKEN,  # delete/comment out this argument if you've set the access token as an env var.\n",
    "    creator=\"UmerHA\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load all issues and PRs created by \"UmerHA\".\n",
    "\n",
    "Here's a list of all filters you can use:\n",
    "- include_prs\n",
    "- milestone\n",
    "- state\n",
    "- assignee\n",
    "- creator\n",
    "- mentioned\n",
    "- labels\n",
    "- sort\n",
    "- direction\n",
    "- since\n",
    "\n",
    "For more info, see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(docs[0].page_content)\n",
    "print(docs[0].metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Only load issues"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the GitHub API returns considers pull requests to also be issues. To only get 'pure' issues (i.e., no pull requests), use `include_prs=False`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = GitHubIssuesLoader(\n",
    "    repo=\"langchain-ai/langchain\",\n",
    "    access_token=ACCESS_TOKEN,  # delete/comment out this argument if you've set the access token as an env var.\n",
    "    creator=\"UmerHA\",\n",
    "    include_prs=False,\n",
    ")\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(docs[0].page_content)\n",
    "print(docs[0].metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Github File Content\n",
    "\n",
    "For below code, loads all markdown file in rpeo `langchain-ai/langchain`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.document_loaders import GithubFileLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = GithubFileLoader(\n",
    "    repo=\"langchain-ai/langchain\",  # the repo name\n",
    "    access_token=ACCESS_TOKEN,\n",
    "    github_api_url=\"https://api.github.com\",\n",
    "    file_filter=lambda file_path: file_path.endswith(\n",
    "        \".md\"\n",
    "    ),  # load all markdowns files.\n",
    ")\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "example output of one of document: \n",
    "\n",
    "```json\n",
    "documents.metadata: \n",
    "    {\n",
    "      \"path\": \"README.md\",\n",
    "      \"sha\": \"82f1c4ea88ecf8d2dfsfx06a700e84be4\",\n",
    "      \"source\": \"https://github.com/langchain-ai/langchain/blob/master/README.md\"\n",
    "    }\n",
    "documents.content:\n",
    "    mock content\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
